Comparison of CT derived body composition at the thoracic T4 and T12 with lumbar L3 vertebral levels and their utility in patients with rectal cancer

Background Computed tomography (CT) derived body composition measurements of sarcopenia are an emerging form of prognostication in many disease processes. Although the L3 vertebral level is commonly used to measure skeletal muscle mass, other studies have suggested the utilisation of other segments. This study was performed to assess the variation and reproducibility of skeletal muscle mass at vertebral levels T4, T12 and L3 in pre-operative rectal cancer patients. If thoracic measurements were equivalent to those at L3, it will allow for body composition comparisons in a larger range of cancers where lumbar CT images are not routinely measured. Research methods Patients with stage I – III rectal cancer, undergoing curative resection from 2010 – 2014, were assessed. CT based quantification of skeletal muscle was used to determine skeletal muscle cross sectional area (CSA) and skeletal muscle index (SMI). Systematic differences between the measurements at L3 with T4 and T12 vertebral levels were evaluated by percentile rank differences to assess distribution of differences and ordinary least product regression (OLP) to detect and distinguish fixed and proportional bias. Results Eighty eligible adult patients were included. Distribution of differences between T12 SMI and L3 SMI were more marked than differences between T4 SMI and L3 SMI. There was no fix or proportional bias with T4 SMI, but proportional bias was detected with T12 SMI measurements. T4 CSA duplicate measurements had higher test–retest reliability: coefficient of repeatability was 34.10 cm2 for T4 CSA vs 76.00 cm2 for T12 CSA. Annotation time (minutes) with L3 as reference, the median difference was 0.85 for T4 measurements and -0.03 for T12 measurements. Thirty-seven patients (46%) had evidence of sarcopenia at the L3 vertebral level, with males exhibiting higher rates of sarcopenia. However, there was no association between sarcopenia and post-operative complications, recurrence or hospital LOS (length of stay) in patients undergoing curative resection. Conclusions Quantifying skeletal muscle mass at the T4 vertebral level is comparable to measures achieved at L3 in patients with rectal cancer, notwithstanding annotation time for T4 measurements are longer.


Introduction
Loss of muscle mass and function is an age-related pathological process and in extreme cases is referred to as sarcopenia, which is associated with decreased survival in cancer patients [1][2][3]. As body composition measurements of sarcopenia can serve as markers of overall health, they may therefore play an important role in the prognostication of disease processes. The third lumbar vertebral level (L3) gold standard level for body composition assessment in rectal cancer patients, but for other cancer types, thoracic CT slices have been used for body composition analysis as their CT imaging may not include slices from the lumbar region [4]. In order to compare the effect of body composition changes between different cancers, it is vital that we have the ability to adequately utilise the same vertebral level in this comparison. Body composition is almost exclusively performed opportunistically using CT scans acquired as part of routine care. There is unfortunately very little information within the literature which had compared the reproducibility and ease of measurement between lumbar and thoracic CT slices.
Our primary objective was therefore to assess the variation in skeletal muscle mass at thoracic 4, thoracic 12 and lumbar 3 regions in patients with colorectal cancer (CRC) prior to surgery. Using L3 measurements as a reference, we chose to compare the body composition measurements and patient clinical outcomes at the fourth and twelfth thoracic vertebral levels (T4, T12), as these are two of the commonly utilised thoracic levels in the literature [5][6][7]. Our secondary objective was to determine the repeatability and measurement time of body composition assessments at these three vertebral levels.

Methods
The study was performed according to the Helsinki declaration, the International Conference on Harmonisation Guidelines for Good Clinical Practice and approved by the local institutional ethics committee. Quality Assurance Project Number: Quality Assurance 2020.24 ERM ID Reference Number: 63907.
A retrospective analysis was performed on rectal cancer patients from a pre-existing prospectively maintained database (ACCORD). Patients were identified over a 5-year period between January 2010 to December 2014 from a single tertiary centre, Footscray Hospital, Western Health, Melbourne, Australia. The following inclusion criteria were used: (1) patients who underwent preoperative initial staging CT scans, and (2) were treated for rectal cancer with curative surgical intent. Exclusion criteria were (1) CT examination not performed at our centre, (2) staging scans of the chest not performed at the time of abdominopelvic scans and (3) missing initial CT staging scans. A total of 118 patients were identified of which 80 patients were eligible for analysis ( Fig. 1). All scans were performed at Western Health prior to any surgical or oncological intervention using a General Electric Lightspeed CT (GE Medical Systems, Milwaukee, WI) scanner and saved as Digital Imaging and Communications in Medicine (DICOM) images for analysis utilising Slice-O-Matic software (v.5.0 Tomovision, Montreal Canada). Standard CT procedures of 120 kV, 3 mm thickness, and a 512 × 512 matrix were used for all subjects. Two investigators (R.G. and A.A.) trained as per the Alberta protocol [8] completed landmarking and manual segmentation at the L3 level. Landmarking and segmentation was then progressed to T12 and T4 by investigator A.A. using single-slice axial CT images acquired at the midpoint of each of the fourth and twelfth thoracic vertebrae. Segmentation was undertaken highlighting muscle groups utilising anatomical knowledge and based upon existing protocols within the literature [6,7]. The measurements at L3, T4 and T12 were validated by a second reader (R.G.) through random selection of images.
Each abdominal and thoracic axial CT was segmented to distinguish the various muscle groups at the T4, T12 and L3 levels using anatomical knowledge and tissuespecific Hounsfield Unit (HU) ranges as highlighted in Fig. 2. Cross-sectional area (CSA (cm 2 )) of the sum of the muscles were computed for each image. Skeletal muscle (SM) cross-sectional areas (cm 2 ) were calculated using standard Hounsfield Unit ranges [SM: -29 -150]. The ranges were chosen based on previous recommendations [9]. The skeletal muscle index (SMI cm 2 /m 2 ) was determined by normalising the muscle area for the patient's height in meters squared, similar to body mass index (by use of the Mostellar formula) [10] i.e. CSA/(height Transverse CT image at T4 highlighting the segmented total area (SMI. 32.9cm 2 /m 2 ). c Transverse CT image at T12 highlighting the segmented total area (SMI 18.2cm 2 /m 2 ). d Transverse CT image at L3 highlighting the segmented total area (SMI. 28.2cm 2 /m 2 ) (metres) 2 ). Both readers were blinded to patient clinical status and outcomes.
In our study we utilised the Prado et al. defined sexspecific cut-offs for L3 vertebral level skeletal muscle index (SMI) associated with mortality ascertained by optimum stratification. These were 52.4 cm 2 /m 2 for men and 38·5 cm 2 /m 2 for women; patients below these values were classified as having sarcopenia [11].

Statistical analysis
Descriptive statistics included mean, standard deviation and median, inter-quartile range (IQR) as appropriate. We compared SMI measurements at T4 and T12 against our reference at L3 by first computing a percentile for each ranked difference between T4 SMI and L3 SMI and between T12 SMI and L3 SMI. Folded empirical cumulative distribution curves (mountain plots) were generated: all percentiles over 50, percentile (y-axis) = 100-percentile were then plotted against differences (x-axis). The mountain plot provides information about the distribution of the differences [12]. As correlation metrics are not measures of agreement but only measures of linear association, analyses of measurement errors (T4 SMI and T12 SMI against a reference L3 SMI) were hence conducted using Ordinary Least Product (OLP) regression to uncover systematic differences and, in particular, to detect and distinguish fixed and proportional bias [13][14][15][16]. To quantify reliability of test-retest for duplicate measurements, we used the coefficient of repeatability (CR) [17,18]. All tests were two-sided, and p < 0.050 was considered significant. Statistical analyses were performed using Systat v12 (Systat Software, Inc., Chicago, IL, USA) for OLP; MedCalc ® Statistical Software version 20.116 (MedCalc Software Ltd, Ostend, Belgium) for folded empirical cumulative distribution plots and Stats-Direct v. 3.0 (StatsDirect Ltd, Cheshire, UK) for coefficient of repeatability.

Results
Of the total 80 patients included in the study, 21 (26%) were female and 59 (74%) male patients; mean age of 63.0 ± 13.0 (range, 30-86 years). The main baseline patient characteristics are summarised in Table 1. The mean CSA (cm 2 ) were larger at the T4 level (165.3 cm 2 ) compared to that at the T12 (94.4 cm 2 ) and L3 levels (142.9 cm 2 ), p < 0.001. The mean SMI (cm 2 /m 2 ) was also larger at the T4 level (57.8 cm 2 /m 2 ) compared to that at the T12 (33.1 cm 2 /m 2 ) and L3 levels (49.9 cm 2 /m 2 ), p < 0.001. Whilst females had a larger BMI with a mean of 30.6, as compared to males (26.8, p = 0.006), muscle area was significantly larger in men at all vertebral levels, p < 0.001 (Table 2). When we applied the Prado et al. definitions for thresholds for sarcopenia (< 52.4 cm 2 /m 2 for males and < 38.5 cm 2 /m 2 for females) [11] to our cohort, 37 patients (46%) had evidence of sarcopenia at the L3 vertebral level, with males exhibiting higher rates of sarcopenia (6 female vs 31 male).
The scatter diagrams in Fig. 3 show that both the SMI at T4 and T12 had a linear relationship with measurements from L3. The mountain plots in Fig. 4 demonstrate a larger distribution of differences between T12 and L3 than that of T4 and L3. OLP regression were then conducted which shows no fixed or proportional bias between T4 and L3, whilst when T12 was analysed against L3 there was evidence of a proportional bias (Table 3).
For the interobserver validation of our measurement techniques, readers (A.A.), and (R.G.) independently evaluated five selected images (chosen randomly) at each of the levels of T4, T12 and L3. Duplicate measurements were quantified using the Coefficient of Repeatability  (CR), the value below which the absolute differences between two measurements would lie with a probability of 95%. The presence of a larger variability for the CSA between the L3 T12 levels (76.00) compared to that comparing L3-T4 (34.10), implied that T4 CSA measurements had higher test-retest reliability (Table 4). Two independent readers (A.A.) and (J.Y.) measured the timing needed for annotating the skeletal muscle of five randomly selected CT images at each level. we demonstrated that there was a statistically significant difference between the time required to measure skeletal muscle mass at vertebral levels T4 compared with L3, favouring T12 with regards to time for measurement (median difference between L3 and T4: 0.85 min, In contrast, the time to measure L3 and T12 (-0.03 min; p = 0.75) was similar (Fig. 5).
Although various cut-offs have been described within the literature, there is only a very limited number of studies available and none have been widely validated at both the T4 and T12 levels [5,19]. As there were no studies comparing vertebral thoracic levels T4, T12 and L3 in a colorectal population, we therefore divided our patients into four groups according to SMI, allowing us to visualise the effects of both the highest and lowest quartile SMI. This method has was similar to how Lee et al. analysed their data in their study comparing T4 with that of L3 in pre-operative patients undergoing lung transplantation [6].
Sarcopenia was assessed by analysing the SMI from CT images at the T4, T12 and L3 and was defined by the lowest quartile of SMI (Q1). Sarcopenia based on this definition generated the following cut-off values; T4 SMI < 50.47, T12 SMI < 28.01, L3 SMI < 42.36. Using this information, we then assessed the association between this "newly defined" L3 sarcopenia definition with the established Prado et al. [11] definition. We demonstrated that there was a statistically significant relationship between these two definitions (chi-square with one degree of freedom = 6.051, with p = 0.014). Whilst the comparison between sarcopenic patients defined at T12 and those defined by Prado were statistically significant (P < 0.001), those comparisons at T4 showed no significance (P = 0.052).
We then investigated whether sarcopenia affected outcomes including medical and surgical complications, development of locoregional recurrence and distal metastases, as well as total hospital length of stay (LOS) and death. Whilst the sarcopenic patients (Q1) showed poorer survival at all three vertebral levels (T4: P = 0.003; T12: P = 0.013; L3: P = 0.013), there was no association with medical and surgical complications, recurrence rates nor length of stay (Table 5).

Discussion
This study has shown that CT derived muscle mass at T4 is comparable to those measures obtained from the L3 level in patients with rectal cancer. Body composition measurements at T12 revealed a larger distribution of differences and a proportional bias was noted (Fig. 4, Table 3).
In method comparison studies, the appropriate analysis should aim to uncover systematic differences. There are two potential sources of systematic disagreement between methods of measurement: fixed and proportional bias. Fixed bias means that one method gives values that are higher (or lower) than those from the other by a constant amount; proportional bias means that one method gives values that are higher (or lower) than those from the other by an amount that is proportional to the level of the measured variable [13][14][15][16].  Previous studies on T4 and T12 CSA [5,7,20] have reported findings contradictory to our own results. High values of Pearson's correlation coefficients (r) had been used as metrics to assess agreements with L3 measurements in other publications [21,22]. However, it has been well documented in the statistical literature that Pearson's correlation coefficient merely indicates the scatter of values around the line of best fit, (Fig. 3) regardless of whether the slope of that line differs from unity (proportional bias) or whether its intercept differs from zero (fixed bias) [13][14][15][16]. It does no more than indicate the strength of the linear association between the x and y variables in the examined population. The information provided by r is, therefore, of no value in detecting systematic biases between methods. Ordinary least product (OLP) regression and or Bland-Altman plot [21,22] are the appropriate analysis to use in these situations.
Utilising CT derived muscle mass at the L3 is currently the gold standard for body composition analysis, which serves a marker for total body skeletal muscle quantity [9,10,23,24]. Although artificial intelligence (AI) derived 3D total body composition measurements have been developed and have been described within the literature, there are very few studies using validated patient data, and most clinical data linked research is generally only based on single level semiautomated assessments [4]. There is also growing evidence that other levels including T4 and T12 are equally representative of patient body composition, as shown in conditions such as lung cancers [5,25], and patients undergoing interventional cardiothoracic and vascular surgery [19,26].
In our study, we found that both total muscle cross sectional area (CSA) and skeletal muscle index (SMI) were significantly higher in males than in females; however female patients had significantly higher BMI compared to our male patients. We found that muscle measurement at all vertebral levels was relatively easy to complete by manual segmentation. When interobserver agreement was compared, the much larger variability seen for L3-T12 CSA compared to L3-T4 CSA implied that T4 CSA measurements had higher test-retest reliability. When we looked at time and consistency to complete body composition analysis between T4, T12 and L3 levels, we found that overall, the T12 vertebra reading required slightly less time to annotate compared to L3. However, the readings at T4 vertebral levels had a more consistent segmentation result between graders.
We also performed a comparison of the degree of muscle quantity between our thoracic and lumbar levels within our patient cohort. We found that the crosssectional muscle areas were greatest at the vertebral T4 level, followed by L3 then finally T12. Comparing methods of measurement utilising percentile ranked differences and ordinary least product regression, we found a stronger agreement between both CSA and SMI measured at L3 with that at T4 as compared with T12. However, despite these agreements, we noticed that when trying to annotate the muscle groups within the T4 level, some of the patients' skeletal muscles were "cut off " from the CT scan thereby affecting our ability to complete the total calculation of body composition in those images.
With regards to clinical outcomes, our study did identify evidence of sarcopenia within our cancer cohort; however, we did not find an association between sarcopenia and post-operative complications, recurrence rates or hospital LOS in patients undergoing curative resection. We did however identify that sarcopenia was related to a reduced survival.
The strength of this study is the robust analysis of measurement errors (fixed and proportional bias) as distinct from using correlation coefficients to assess agreements. We recognised that Pearson's (r) value does not provide clinicians with any insight into systematic errors that may be inherent in the measurement obtained with a specific assessment tool. We therefore utilised ordinary least product (OLP) regression for our analysis.
There were several limitations to our work. Our study relied on retrospective data from a single-centre and only 5 scans at each vertebral level were read to determine the interobserver reliability. Whilst our study was able to determine a positive relationship between both T4 and T12 vertebral levels with L3, half of the images (40 out of 80) acquired at T4 were noted to have "cut offs", where the outer circumference of the muscle mass was missing; a problem also seen in other studies assessing skeletal muscle at the T4 level [5,27]. To account for this "cut off ", the "arm" muscles were not counted in the final crosssectional area. This therefore led to a poorer reproducibility as it was not clear as to where the boundaries of the muscles around the scapula and arm might be. Previous studies encountered a similar problem and overcame this by including only the pectorals, intercostals and muscles of the back in their CSA measurements [27,28].
The findings of a strong association between T4 and L3 measurements suggested that the thoracic muscles, like those at the lumbar level, would be reasonably representative of total body skeletal muscle quantity. However, consideration must be given to the functions of the muscles at their respective levels. For example, at the T12 and L3 levels, muscles include the rectus abdominis, external and internal oblique and erector spinae (the core muscles). These muscles are thought to initiate full-body functional movement and are essential for stabilising the body in dynamic movements [5]. Although some of these muscles (erector spinae) also extend upwards, the major muscles annotated at the T4 level, (pectoralis muscles, and supra and infraspinatus muscles) primarily function in a different way, through mobilisation of the arm and shoulder gridles, resulting in potentially different CSA and SMI results [5,29].

Conclusions
This study demonstrated that quantifying skeletal muscle mass at the T4 vertebral level is comparable to measures achieved at L3 in patients with rectal cancer, notwithstanding that annotation time for T4 measurements are longer. This new information may be useful in the future to allow clinicians to accurately compare the effect of body composition between different cancers using thoracic levels.